Add a pluggable server extension API with MCP Apps#3003
Conversation
Thread an `extensions` argument through the low-level `Server.get_capabilities` and `create_initialization_options` (mirroring `experimental`), backed by a `Server.extensions` attribute so the streamable-HTTP `server/discover` path advertises it too. Add an `extensions` branch to `Connection.check_capability` (presence-of-identifier, since settings are negotiated per-extension) and let a client advertise its own support via `Client(extensions=...)` / `ClientSession(extensions=...)`, mirrored into `ClientCapabilities.extensions`.
Introduce `Extension`, a narrow base class (HTTPX `Transport`/`Auth` style) whose methods default so an extension overrides only what it needs: `settings()`, `tools()`, `resources()`, `methods()`, and `intercept_tool_call()`. `MCPServer` accepts `extensions=[...]` at construction and `add_extension()` later, applying a closed set of contributions (tool/resource/method bindings) and composing every extension's `tools/call` interceptor into one `ServerMiddleware`. The server never hands itself to an extension; the extension declares what it adds as data.
`Apps` is an additive `Extension`: `@apps.tool(resource_uri=...)` binds a tool to a `ui://` UI resource via `_meta.ui.resourceUri`, `add_html_resource()` serves the HTML at `text/html;profile=mcp-app`, and `client_supports_apps(ctx)` gates the SEP-2133 text-only fallback. Drop the now-exercised `# pragma: no cover` on `TextResource.read()` (the Apps resource path covers it).
`Tasks` is an interceptive `Extension`: `intercept_tool_call` records a task-augmented `tools/call` and stamps the task id into `_meta[io.modelcontextprotocol/related-task]`, while `methods()` serves `tasks/get`, `tasks/result`, `tasks/cancel`, and `tasks/list` over an in-memory store. It demonstrates the interceptive seam; the augmented call returns a `CallToolResult` rather than `CreateTaskResult` because the `tools/call` result schema admits only `CallToolResult | InputRequiredResult` (TODO L56). Also add the negotiation-plumbing tests shared by both extensions.
Wire runnable `apps` and `tasks` stories (in-memory + http-asgi) into the manifest and document the extensions API in the migration guide.
Drop the public `MCPServer.add_extension`; extensions are fixed at construction via `extensions=[...]` (the apply logic moves to a private `_apply_extension`, with the `tools/call` interceptor composed once afterwards). This matches the declarative design and removes the mid-connection mutation footgun. Rework the tasks story around a `render_report` tool whose multi-step work motivates running it as a task, with named `_start_task` / `_get_task` / `_task_result` helpers so the client reads as a clear lifecycle.
There was a problem hiding this comment.
3 issues found and verified against the latest diff
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
Make explicit that a plain tools/call is unchanged - only a call carrying a `task` field becomes a task - and document that per-tool gating on the declared `ToolExecution.task_support` is not enforced by this reference extension.
# Conflicts: # src/mcp/server/mcpserver/__init__.py # src/mcp/server/mcpserver/server.py
maxisbey
left a comment
There was a problem hiding this comment.
Went through this against the spec primary sources (SEP-2663, SEP-2133, the ext-apps spec, and schema/draft/schema.ts). The framework shape is good and Apps is basically right, but tasks.py is implementing the 2025-11-25 in-core Tasks design rather than the SEP-2663 extension that actually ships in 2026-07-28 — they diverge on almost every wire-observable detail.
Tasks (src/mcp/server/tasks.py) — implements the wrong spec version
Everything below is a divergence from SEP-2663 / schema/draft/schema.ts, cross-checked against the 2025-11-25 schema where the current behavior comes from.
Method set
methods()registerstasks/resultandtasks/list. Both were removed by SEP-2663 — clients calling either MUST get-32601, andtasks/listreintroduces the cross-caller enumeration leak the SEP explicitly designed out. The 2026 set is exactly{tasks/get, tasks/update, tasks/cancel}.tasks/updateis not registered, so there's no way for a client to deliverinputResponsesfor theinput_requiredflow.
Opt-in and result envelope
intercept_tool_callgates onparams.taskas the client opt-in. SEP-2663 says servers MUST ignoreparams.task(it's the legacy 2025 field) — the server is the sole decider, gated only on whether the client declaredio.modelcontextprotocol/tasksin its per-request capabilities.- A task-augmented
tools/callreturns aCallToolResultwith_meta["io.modelcontextprotocol/related-task"]. That_metakey is 2025-only (it's inschema/2025-11-25/schema.ts:1330and absent fromschema/draft). The 2026 shape is aCreateTaskResultdiscriminated byresultType: "task"withtaskId/status/createdAtflat on the result. settings()returns{"list": {}, "cancel": {}}— that's the removed legacycapabilities.taskssub-shape leaking into extension settings. Per SEP-2663 andschema/draft/examples/ServerCapabilities/extensions-tasks.jsonthe settings object is{}.
Lifecycle and shapes
tasks/getreturns only the flatTasksnapshot. Forcompleted/failed/input_requiredit has to inlineresult/error/inputRequestsper theDetailedTaskdiscriminated union — without that the client has no way to retrieve the tool output through the 2026 method set at all.isError: truefrom the tool routes tostatus: "failed". SEP-2663 says anisError: trueCallToolResultis a completed task whoseresultis thatCallToolResult;failedis reserved for JSON-RPCErrorobjects only. Relatedly,TaskStore.fail()records no error payload, so there'd be nothing to surface forfailed.erroreven after fixingtasks/get.tasks/cancelreturns the fullTaskbody. Spec says it's an emptyResultack (resultType: "complete").- Wire field names: emits
ttl/pollInterval(2025 model fields); SEP-2663 renamed these tottlMs/pollIntervalMs. - No
input_requiredstate path anywhere —TaskStorehas no transition into it and noinputRequestsstorage, so MRTR-over-tasks isn't implementable on this store.
Security / robustness
- Task IDs are sequential
f"task-{n}". The spec requires sufficient entropy because the ID is a bearer capability fortasks/get/tasks/cancel. _require()returns any task by ID with no principal check; the spec says servers MUST authn/authz each task-related request.tasks/get/tasks/canceldon't check the per-request client extension capability; spec says non-declaring clients MUST get-32021withdata.requiredCapabilities.- If
await call_next(ctx)raises, the task stays permanently"working"(no try/except around the call). payload = result if isinstance(result, dict) else {}silently drops a non-dict downstream result (e.g. a pydantic model from another middleware) — the response becomes{"_meta": {...}}with the tool'scontentgone.
Construction
TaskStoreis hard-instantiated inTasks.__init__with no injection seam, which contradicts the module's own docstring exampleTasks(store)and means the in-memory store is the only option (problem for stateless HTTP).- The default
clockis_fixed_clockreturning the constant"1970-01-01T00:00:00Z", so out of the box every task'screatedAt/lastUpdatedAtis the Unix epoch.
Tests and the example story lock the wrong shape in. tests/server/test_tasks.py asserts tasks/result/tasks/list are routable, asserts _meta[related-task], uses params.task as opt-in, asserts isError→failed, asserts a body on cancel, asserts ttl not ttlMs, and hard-codes taskId == "task-1". examples/stories/tasks/ does the same and the README's "Caveats" section frames returning CallToolResult+_meta instead of CreateTaskResult as a "deliberate simplification" — that's a spec violation in user-facing docs, not a simplification. A spec-conformant server would -32601 the example client.
Extension framework — four structural things
The declarative Extension shape itself is nice (and there's no precedent for it in the other SDKs, so this is the reference). Four things I'd want fixed before it lands:
- Layering.
Extensionis defined inmcp/server/mcpserver/extension.py, so helper-tiermcp/server/apps.pyandmcp/server/tasks.pyimport upward frommcp.server.mcpserver.*. The base class belongs atmcp/server/withMCPServercomposing it, so the dependency arrow points the right way (and so third-party extensions don't depend on the composition tier). identifierenforcement. It's a bare class annotation; a subclass that forgets it constructs fine and only blows up withAttributeErrorinside_apply_extension. A__init_subclass__check (and ideally_meta-key grammar validation, since the spec says extension IDs MUST carry a prefix) would make it fail at class-definition time.MethodBindingis version-blind. It carries no protocol-version field and registers into the flat_request_handlers[method]dict, so extension methods bypass the(method, version)boundary table the runner uses for core methods. An extension can't declaratively say "this method exists only at 2026-07-28" — it'd have toif version == ...inside the handler.- No
-32021raise seam.Connection.check_capabilitygets a boolean extensions branch, but there's norequire_client_extension(id)(or similar) that raisesMissingRequiredClientCapabilityErrorwithdata.requiredCapabilities.extensions = {id: {}}. The error type already exists inmcp_types; without the helper every extension author has to hand-construct it.
Apps (src/mcp/server/apps.py) — looks right, a few additive gaps
The fundamentals match the ext-apps spec: EXTENSION_ID = "io.modelcontextprotocol/ui", text/html;profile=mcp-app, nested _meta.ui.resourceUri, server auto-advertises under capabilities.extensions. Nothing blocking. Smaller things:
client_supports_apps()only checks key presence, nevermimeTypes, so a client advertising{"mimeTypes": ["application/x-something-else"]}reads as HTML-capable. The ts reference checksmimeTypes.includes(RESOURCE_MIME_TYPE).@apps.tool()has novisibilitykwarg, so you can't set_meta.ui.visibility: ["app"].add_html_resource()has no way to setUIResourceMeta(csp/permissions/domain/prefersBorder) on the registered resource.- Passing
meta=through@apps.tool(..., meta={...})is accepted by**tool_kwargsbut raisesTypeError: got multiple values for keyword argument 'meta'at server construction (_apply_extensioncallsadd_tool(tool.fn, meta=tool.meta, **tool.kwargs)). - Nits: no typed
McpUi*Metamodels (untyped dicts), no cross-check that everyresource_uriactually has a registered resource, no opt-out fromresources/listfor UI-only resources. - One open question: ts-sdk and csharp both ship Apps in a separate package (
@modelcontextprotocol/ext-apps,ModelContextProtocol.Extensions.Apps). Is in-coremcp.server.appsdeliberate? If so worth a line in the module docstring.
Validating it's right
Neither extension currently has external proof-of-correctness:
- Conformance. The harness has 10 SEP-2663
tasks-*server scenarios but they're in thependinglist and no python-sdk CI leg selects them;mcp-everything-serverdoesn't mountTasks(). To make this PR externally verifiable: add a--scenario 'tasks-*'leg toconformance.yml, add an--extensions tasksflag to the everything-server fixture, and seed anexpected-failures.tasks.yml. After the SEP-2663 rewrite, these seven should be green:tasks-dispatch-and-envelope,tasks-capability-negotiation,tasks-required-task-error,tasks-wire-fields,tasks-lifecycle,tasks-mrtr-input,tasks-request-state-removal. There are zero Apps scenarios in the harness — worth proposing upstream. - Interaction tests. I'd write the spec-derived
tests/interaction/mcpserver/test_tasks.pyfirst (one@requirementper SEP clause: settings shape{}, non-declaring client gets inline result, non-declaringtasks/get→-32021,resultType:"task"flat,isError→completed,tasks/cancelempty ack,tasks/result/tasks/list→-32601,ttlMsfield name, ID entropy,tasks/updateack, store injectable). They'll be red on this branch — that's the bar the rewrite hits. - Stories smoke. The
tests/examples/test_stories{,_smoke}.pyinfrastructure already runs both stories over in-memory + real HTTP. Onceexamples/stories/tasks/is rewritten to the 2026 shape it becomes an end-to-end smoke test for free. Adding a wire-shape assertion (raw response has"resultType": "task"and norelated-task_metakey) would have caught every Tasks issue above. - Cross-SDK. The conformance leg covers py-server↔ts-client. For py-client, csharp-sdk is currently the only other SDK with a working Tasks runtime (
McpServerImpl.csreturnsresultType:"task"), so that's the interop peer for the polymorphic-result handling.
Suggested split
Given the size of the Tasks delta, I'd land framework + Apps here (after the four framework fixes and the Apps minors) and take Tasks as a follow-up PR rewritten to SEP-2663, with the interaction tests landing red first and the conformance leg wired in the same PR.
The Tasks implementation was built against the 2025-11-25 in-core design still
carried (types-only) in mcp_types, not SEP-2663 (the extension that ships in
2026-07-28). They diverge on nearly every wire-observable detail: SEP-2663 makes
the server the sole decider (ignoring the legacy params.task), uses the
{tasks/get, tasks/update, tasks/cancel} method set (no tasks/list or
tasks/result), returns a CreateTaskResult discriminated by resultType: "task"
(not a CallToolResult with _meta), advertises {} settings, gates on
execution.taskSupport, and renames ttl/pollInterval to ttlMs/pollIntervalMs.
Remove the extension, its tests, and its story rather than ship a spec-violating
example; restore tasks to the deferred manifest list with a SEP-2663 pointer. The
generic Extension API and the Apps reference extension are unaffected and still
at 100% coverage. Tasks returns as a separate PR rewritten to SEP-2663 with the
conformance tasks-* scenarios wired in.
…Apps fixes Framework: - Move the Extension base class from mcp/server/mcpserver/extension.py to mcp/server/extension.py so helper-tier modules (apps.py) and third-party extensions depend on the base, not the composition tier. - Enforce a vendor-prefix/name identifier via __init_subclass__ (and at apply time for per-instance identifiers), failing at class-definition rather than late with AttributeError. - Add MethodBinding.protocol_versions so an extension method can be scoped to specific wire versions; out-of-range requests get METHOD_NOT_FOUND. - Add require_client_extension(ctx, identifier) raising the -32021 missing required client capability error with a requiredCapabilities payload. Apps: - client_supports_apps now checks the client advertised the text/html;profile=mcp-app MIME type, not just the extension key. - Add a visibility kwarg to @apps.tool (_meta.ui.visibility). - Let add_html_resource set csp/permissions/domain/prefers_border on the resource _meta via typed ResourceCsp/ResourcePermissions models. - Fix the meta= double-keyword TypeError by making meta an explicit param merged with the ui entry instead of passing through **tool_kwargs.
There was a problem hiding this comment.
4 issues found across 8 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
…Apps checks - Validate extension identifiers against the spec's _meta key grammar (per-label structure, fullmatch so a trailing newline cannot pass) - Reject MethodBindings that name spec-defined request methods, collide with an already-registered handler, or pin an empty version set; the runner's per-version surface gate would never route those anyway - client_supports_apps requires mimeTypes to list text/html;profile=mcp-app, matching the reference implementation; a missing key means unsupported - Require every @apps.tool resource_uri to resolve to a resource registered on the Apps instance, failing at construction instead of 404ing on resources/read; add Apps.add_resource for pre-built ui:// resources - Document the new construction-time errors in the migration guide
There was a problem hiding this comment.
I didn't find any new issues in this revision — the earlier feedback (Tasks removal, meta merging, the mimeTypes check, the stale extension.py docstring, identifier grammar, and method-collision enforcement) all looks addressed. That said, this introduces a new public extension API surface and SEP-2133 capability negotiation, so it's worth a maintainer's review of the design before merge.
Extended reasoning...
Overview
This PR adds a pluggable, opt-in server extension API (SEP-2133): a new mcp.server.extension module (Extension, ToolBinding/ResourceBinding/MethodBinding, compose_tool_call_interceptor, identifier validation), MCPServer(extensions=[...]) wiring, an extensions capability map threaded through the lowlevel Server (get_capabilities/create_initialization_options), client-side Client(extensions=...)/ClientSession(extensions=...), Connection.check_capability extension support, the Apps reference extension (io.modelcontextprotocol/ui), an example story, migration-guide docs, and ~880 lines of new tests. The originally-included Tasks extension was dropped in favor of a SEP-2663 follow-up.
Security risks
Low. The extension mechanism is construction-time and author-controlled — extensions are passed explicitly to MCPServer(...), never loaded dynamically. The PR adds guards that reduce risk: extension identifiers must match the spec's _meta grammar, MethodBinding cannot name spec methods, method-name collisions and duplicate identifiers raise at construction, and ui:// scheme enforcement plus the resource-registration check prevent dangling _meta.ui.resourceUri references. require_client_extension and client_supports_apps only gate behavior on client-declared capabilities; no auth, crypto, or input-parsing paths are touched. The capability-negotiation change in Connection.check_capability is presence-based and additive.
Level of scrutiny
High — not because of any specific defect, but because this defines a new public API surface (the Extension base class shape, the closed contribution set, the extensions= constructor parameter, new mcp.server.mcpserver exports) and a wire-visible capability map that other SDKs and the spec ecosystem will need to stay consistent with. Those are design decisions a maintainer should own; shadow approval is not appropriate for an API of this scope regardless of implementation quality.
Other factors
The bug-hunting pass on the current revision found no new issues, and all earlier review feedback (mine and cubic's) appears addressed in commits 0f440b1, cb2c456, and bb9f793 — the meta-kwarg merge, the stricter mimeTypes check, the identifier grammar, the method-collision guard, and removal of the stale Tasks references. Test coverage is thorough (e2e in-memory client tests for additive/interceptive extensions, version gating, capability negotiation, and Apps wiring), and the example story plus migration docs match the shipped behavior I spot-checked.
- docs/advanced/extensions.md: using and writing extensions (identifier grammar, contributions, vendor methods, version pinning, the tools/call interceptor, client-side declaration), backed by runnable docs_src examples with tests proving every claim - docs/advanced/apps.md: the two-part Apps model, graceful degradation and the meaningful-content rule, CSP/permissions field tables, visibility semantics, the construction-time checks, and the add_resource escape hatch - examples/stories/extensions: a custom extension end to end (settings entry, contributed tool, vendor method gated on the client declaring the extension back) - Apps.tool() rejects a caller-supplied 'ui' meta key instead of silently clobbering it; add_resource defaults the app MIME type and rejects an explicit mismatch; resource _meta.ui pinned on both list entry and read content item
A request-sized parameter in example code should demonstrate input bounding: Field(ge=, le=) rejects an absurd limit before the handler allocates anything for it.
There was a problem hiding this comment.
1 issue found across 19 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="docs_src/apps/tutorial001.py">
<violation number="1" location="docs_src/apps/tutorial001.py:22">
P3: The clock tutorial returns a hard-coded timestamp, so copied example code never reports the current time. Compute `now` at call time instead.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
|
|
||
| @apps.tool(resource_uri="ui://clock/app.html", description="The current time.") | ||
| def get_time(ctx: Context) -> str: | ||
| now = "2026-06-26T12:00:00Z" |
There was a problem hiding this comment.
P3: The clock tutorial returns a hard-coded timestamp, so copied example code never reports the current time. Compute now at call time instead.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs_src/apps/tutorial001.py, line 22:
<comment>The clock tutorial returns a hard-coded timestamp, so copied example code never reports the current time. Compute `now` at call time instead.</comment>
<file context>
@@ -0,0 +1,30 @@
+
+@apps.tool(resource_uri="ui://clock/app.html", description="The current time.")
+def get_time(ctx: Context) -> str:
+ now = "2026-06-26T12:00:00Z"
+ if not client_supports_apps(ctx):
+ return f"The time is {now}."
</file context>
The extensions and MCP Apps pages had inline Python fragments that CI never executed. Every block now includes a docs_src module: the client programs are async main()s run by tests/docs_src, the identifier and usage fragments are real files, and the FileResource example serves a checked-in report.html. Pages re-include the same file with different hl_lines instead of repeating code.
There was a problem hiding this comment.
I didn't find any new issues in the current revision — the points raised on earlier revisions (the Apps meta merge, the stricter client_supports_apps mimeTypes check, the identifier grammar, method-collision rejection, and the stale Tasks references after the Tasks drop) all appear addressed. That said, this introduces a new public extension API surface (SEP-2133) plus capability-negotiation changes, so it warrants a human look at the API design before merge.
Extended reasoning...
Overview
This PR adds a pluggable, opt-in server extension API (mcp.server.extension.Extension, MCPServer(extensions=[...])) with MCP Apps (io.modelcontextprotocol/ui) as the reference extension. It threads an SEP-2133 extensions capability map through the lowlevel Server, get_capabilities/create_initialization_options, Connection.check_capability, and the client side (Client(extensions=...) / ClientSession), and adds extensive docs (docs/advanced/extensions.md, apps.md), tested docs_src tutorials, runnable example stories, and thorough test coverage (test_extension.py, test_apps.py, test_extensions_capability.py). The originally-included Tasks extension was dropped and deferred to a SEP-2663 follow-up.
Security risks
Risk is moderate-to-low. Extensions are construction-time, author-controlled inputs and the contribution surface is closed (tools/resources/methods/one interceptor); spec-method shadowing and method collisions are rejected at construction, and identifiers are validated against the _meta grammar. The capability-check addition in Connection.check_capability is presence-based and conservative. require_client_extension and client_supports_apps gate behaviour on client-declared capabilities, which are advisory rather than security boundaries — that matches the spec's model. No auth/crypto code is touched.
Level of scrutiny
High. This is new public API in the core SDK (mcp.server.extension, exports from mcp.server.mcpserver, a new Client constructor argument, and a new field on the lowlevel Server), implementing a SEP whose final shape matters for downstream SDK parity. The design choices (closed contribution set, no server reference handed to extensions, single tools/call interception seam, version-gated method bindings) are exactly the kind of large-scale design decisions a maintainer should sign off on, regardless of implementation correctness.
Other factors
Earlier review feedback from both bots was addressed across several commits: the Apps.tool(meta=...) duplicate-keyword crash, the lenient client_supports_apps missing-mimeTypes behaviour, the over-permissive identifier regex, silent method-handler replacement, and the stale Tasks docstring/example references are all fixed in the current diff, with tests pinning each behaviour. Test coverage is strong (in-memory e2e tests, docs_src tests, parametrised grammar/negotiation tests). I found no remaining correctness issues, so the deferral is purely about the size and API-design weight of the change, not about known defects.
Summary
A pluggable, opt-in extension API for
MCPServer(SEP-2133), with MCP Apps as the reference extension. (Tasks was dropped from this PR — see below.)An extension is a narrow base class (HTTPX
Transport/Authstyle) whose methods default, so it overrides only what it needs:You opt in declaratively at construction:
The server applies a closed set of contribution kinds and never hands itself to an extension.
What's included
ServerCapabilities.extensions, SEP-2133): threaded throughget_capabilities/create_initialization_options, advertised overserver/discover, checked inConnection.check_capability, and advertised client-side viaClient(extensions=...).Apps(io.modelcontextprotocol/ui) — additive:@apps.tool(resource_uri=...)binds a tool to aui://resource via_meta.ui.resourceUri;client_supports_apps(ctx)drives the SEP-2133 text-only fallback.appsexample story (in-memory + http-asgi) and a migration-guide entry.Tasks deferred to a follow-up (SEP-2663)
This PR originally included a Tasks extension, but it was built against the 2025-11-25 in-core Tasks design still carried (types-only) in
mcp_types, not SEP-2663 — the extension that actually ships in 2026-07-28. They diverge on nearly every wire-observable detail (server-decided augmentation vsparams.task;{tasks/get, tasks/update, tasks/cancel}vstasks/list+tasks/result;CreateTaskResult/resultType: "task"vsCallToolResult+_meta;execution.taskSupportgating;ttlMs). Rather than ship a spec-violating example, Tasks is removed here and returns as a separate PR rewritten to SEP-2663 with the conformancetasks-*scenarios wired in.Design notes
The shape is the pluggable-interface pattern: declarative wiring (
extensions=[...]), a narrow named interface with defaults, behaviour flowing through as plain values — chosen over a generic plugin framework (no other official SDK builds one) and over anapp=kwarg on@tool(which would couple coreMCPServerto one extension).Testing
In-memory
Client(server)e2e tests, 100% coverage maintained,strict-no-coverclean, pyright + ruff + markdownlint green. Theappsstory legs pass.AI Disclaimer
This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.